Trends in Hearing — Latest Matching Preprints

1

Elliptical speech reveals the use of broad phonetic categories aids noise-degraded speech perception

Bidelman, G.; Eisenhut, Z.; Borowski, L.; Rizzi, R.; Pisoni, D. B.

2026-01-02 neuroscience 10.64898/2026.01.02.695202 medRxiv

Top 0.1%

19.8%

Show abstract

PurposeSpeech perception requires that listeners classify sensory information into smaller groupings while also coping with noise that often corrupts the speech signal. The strength of categorization and speech-in-noise (SIN) abilities show stark individual differences. Some listeners perceive speech sounds in a gradient fashion, while others categorize in a discrete/binary manner, favoring fine acoustic details vs. a more abstract phonetic code, respectively. Prior work suggests SIN processing is (i) related to more gradient phonetic perception and (ii) varies with musical training. MethodTo further probe relations between perceptual gradiency and noise-degraded listening, we measured phoneme categorization, SIN recognition (QuickSIN), and sentence recognition in listeners with varying musical backgrounds. Categorization was measured for vowels and stops using standard labeling tasks. Speech recognition and discrimination were assessed using "elliptical speech" sentences that use featural substitutions which renders them meaningless under clean conditions but surprisingly improves their recognition under noise degradation. We hypothesized listeners who use broader perceptual equivalency classes in hearing elliptical speech would show better SIN perception, indicative of a more gradient listening strategy. ResultsListeners perceived elliptical sentences as sounding different than their intact counterparts in the clear but as the same under noise degradation. But this elliptical benefit varied with music background. Nonmusicians showed larger susceptibility and noise-related benefit of ellipses than musicians, consistent with the notion they used broader phonetic categories (i.e., more gradient listening). Elliptical speech perception was also associated with QuickSIN performance in both groups but in opposite ways. ConclusionsUse of broader categories was related to better SIN processing in nonmusicians but poorer SIN processing in musicians. Findings suggest listeners can use broader perceptual equivalence classes to deal with degraded listening situations but this depends critically on their auditory demographics. Nonmusicians might use broader phonetic categories to aid SIN perception while musicians might use narrower categories or otherwise similar speech contexts.

2

Spatial auditory change detection in listeners with hearing loss

Poole, K. C.; With, S.; Martin, V.; Chait, M.; Picinali, L.; Shiell, M. M.

2025-12-12 neuroscience 10.64898/2025.12.10.693419 medRxiv

Top 0.1%

18.6%

Show abstract

Everyday listening relies on the auditory systems ability to automatically monitor the background soundscape and detect new or changing sources. Although change detection is a fundamental aspect of situational awareness, little is known about how hearing impairment affects this ability. This study examined how sensorineural hearing loss influences spatial auditory change detection. Older hearing-impaired listeners (N = 30) completed a spatial change detection task requiring them to identify the appearance of a new sound source within a complex spatialised acoustic scene. Hearing loss was characterised by three factors that were measured with standard clinical tests: audiometric hearing thresholds, sensitivity to small level changes, and sensitivity to spectrotemporal modulation. Simple and mixed-effects linear models were used to test how these factors predicted reaction time, hit rate, and false alarm rate. Listeners with poorer spectrotemporal sensitivity, higher audiometric hearing thresholds, and older age showed slower and less accurate detection, whereas sensitivity to small changes in level did not predict outcomes. Detection also varied with spatial location, where appearing sources from behind were detected more slowly and less accurately than those from the front or sides. Numerical analysis using head-related transfer functions confirmed that these rear-field effects were unlikely to be explained by overall or frequency-specific acoustic level differences. These findings reveal that hearing loss, age, and spatial factors jointly shape listeners ability to monitor dynamic auditory scenes. Additionally, testing spectrotemporal sensitivity offers a promising clinical measure of non-speech auditory processing with relevance for hearing-aid fitting and situational awareness.

3

When feedback backfires: investigating neurofeedback effects in a closed-loop auditory attention decoding paradigm

Rotaru, I.; Geirnaert, S.; Heintz, N.; Bertrand, A.; Francart, T.

2026-04-30 neuroscience 10.64898/2026.04.28.721343 medRxiv

Top 0.1%

18.5%

Show abstract

Selective auditory attention decoding (AAD) enables tracking which of multiple concurrent speakers a listener attends to and is a key building block for neuro-steered hearing devices. While AAD integrated in a closed-loop system with real-time neurofeedback (NFB) is hypothesized to improve decoding through neural adaptation and error-correction behaviour, the short-term behavioral and algorithmic impact of such a bilateral human-machine interaction remains poorly understood. Here we evaluated the effects of NFB on AAD accuracy and user experience in a single-session AAD paradigm with online NFB involving nineteen participants. They performed a selective listening task with enforced attention switches across four conditions: open-loop (OL), closed-loop with auditory gain feedback (CLA), closed-loop with visual feedback (CLV), and a condition with pseudo-auditory gain control (psCLA) decoupled from the participants individual neural activity. AAD was performed online using both subject-specific and subject-independent linear decoders on 5 s sliding windows, followed by Hidden Markov Model post-processing. Online analysis showed comparable decoding performance across all conditions. However, offline posthoc analysis using subject-independent decoders revealed that AAD accuracy in the CLA condition was significantly lower than in the OL baseline. Subjectively, participants reported that CLA was significantly more distracting and required higher switching effort. Crucially, a causal analysis of the psCLA condition found no robust evidence that higher audio gains inherently improve decoding accuracy. Our results demonstrate that within a single-session paradigm with rapidly varying feedback cues, auditory neurofeedback may degrade AAD performance by increasing cognitive load and distraction. These findings suggest that suboptimal feedback can impede rather than facilitate learning. We conclude that more accurate and stable decoders and longitudinal, multi-session training protocols are likely essential prerequisites for achieving beneficial neurofeedback effects in closed-loop auditory attention systems.

4

Visual contributions to the perception of speech in noise

Alampounti, L. C.; Rosen, S.; Cooper, H.; Bizley, J. K.

2025-09-27 neuroscience 10.1101/2025.09.26.678636 medRxiv

Top 0.1%

18.2%

Show abstract

Investigations of the role of audiovisual integration in speech-in-noise perception have largely focused on the benefits provided by lipreading cues. Nonetheless, audiovisual temporal coherence can offer a complementary advantage in auditory selective attention tasks. We developed an audiovisual speech-in-noise test to assess the benefit of visually conveyed phonetic information and visual contributions to auditory streaming. The test was a video version of the Childrens Coordinate Response Measure with a noun as the second keyword (vCCRMn). The vCCRMn allowed us to measure speech reception thresholds in the presence of two competing talkers under three visual conditions: a full naturalistic video (AV), a video which was interrupted during the target word presentation (Inter), thus, providing no lipreading cues, and a static image of a talker with audio only (A). In each case, the video/image could display either the target talker, or one of the two competing maskers. We assessed speech reception thresholds in each visual condition in 37 young ([≤] 35 years old) normal-hearing participants. Lipreading ability was independently assessed with the Test of Adult Speechreading (TAS). Results showed that both target-coherent AV and Inter visual conditions offer participants a listening benefit over the static image audio-only condition, with the full AV target-coherent condition providing the most benefit. Lipreading ability correlated with the audiovisual benefit shown in the full AV target-coherent condition, but not the benefit in the Inter target-coherent condition. Together our results are consistent with visual information providing independent benefits to listening, through lip reading and enhanced auditory streaming.

5

Neural Tracking of Audiovisual Effects in Noise Using Deep Neural Network-Generated Virtual Humans

Cooper, J. K.; Vanthornhout, J.; van Wieringen, A.; Francart, T.

2025-06-05 neuroscience 10.1101/2025.06.02.656280 medRxiv

Top 0.1%

14.7%

Show abstract

This study investigates the effectiveness of Deep Neural Network (DNN)-generated virtual humans in enhancing audiovisual speech perception in noisy environments, with a focus on using neural measures to quantify these effects. Lip movements are essential for speech comprehension, especially when auditory cues are degraded by noise. Traditional recording methods produce high-quality audiovisual materials but are resource intensive. This research explores the use of DNN avatars as a promising alternative, utilizing a commercially available tool to create realistic virtual humans. The study included both simple sentences and a short story to improve ecological validity. Eleven young, normal-hearing participants proficient in Flemish-Dutch listened to semantically meaningful sentences and a short story with various speaker types: a female FACS avatar, male and female DNN avatars, and a video of a human male speaker. The study included behavioral measures which consisted of an adaptive recall procedure and an adaptive rate procedure and electrophysiological measures consisting of neural tracking. Findings in the adaptive recall procedure showed consistent audiovisual benefits, with the human speaker offering the greatest benefit (-4.75 dB SNR), followed by the DNN avatar (-4.00 dB SNR) and the FACS avatar (-1.55 dB SNR). Additionally in the adaptive rate procedure, the DNN avatar improved speech intelligibility, with average SRTs enhancing from -7.17 dB SNR (audio-only) to -9.02 dB SNR (audiovisual). The results from the neural tracking procedure indicated that most participants experienced audiovisual benefits, particularly in the -9 dB SNR range, revealing that audiovisual cues provided by DNN avatars can enhance speech perception, validating these avatars as effective tools for studying audiovisual effects when using both behavioral measures and electrophysiological measures.

6

Performance Analysis of Speech Recognition Models in Automated Scoring of the QuickSIN Test

Hassanpour, A.; Jiang, Y.; Folkeard, P.; Macpherson, E.; Scollie, S. D.; Parsa, V.

2025-07-25 otolaryngology 10.1101/2025.07.25.25332211 medRxiv

Top 0.1%

13.0%

Show abstract

PurposeBest practices in audiology recommend assessing speech understanding in noisy environments, especially for those with communication difficulties. Speech-in-noise (SiN) assessments such as the QuickSIN are used for validating signal processing in hearing aids (HAs) and are linked to HA satisfaction. This project seeks to enhance QuickSIN test efficiency by applying recent advancements in automatic speech recognition (ASR) technologies. MethodTwenty-three adults with sensorineural hearing loss were fitted bilaterally with Unitron Moxi HAs and were administered the QuickSIN test in low and high reverberation environments. Testing was performed with two different HA programs: an omnidirectional program and a fixed directional microphone program. QuickSIN sentences were presented from 0{degrees} azimuth and competing babble from either 0{degrees}, laterally from 90{degrees} or 270{degrees}, or simultaneously from 90{degrees}, 180{degrees}, and 270{degrees} azimuths. Participants verbal responses to QuickSIN stimuli were scored by an audiologist and were recorded in parallel for offline transcription and scoring by ASR models from Amazon, Microsoft, NVIDIA, and Picovoice. The ASR-derived QuickSIN scores were compared to the corresponding audiologist-derived scores. ResultsRepeated Measures ANOVA results revealed that all ASR models overestimated the QuickSIN scores across most test conditions. Bland-Altman analyses showed that the Amazon ASR model had the least bias and the narrowest range for the limits of agreement, in comparison to the manual scoring by an experienced audiologist. ConclusionsSome ASR models, such as Amazon, demonstrated performance comparable to that of an audiologist in automatically scoring QuickSIN tests. However, further refinements are necessary to increase the robustness of the ASR models in scoring low SNR loss test conditions.

7

The Role of Spatial Separation on Selective and Distributed Attention to Speech

Pinto, D.; Agmon, G.; Zion Golumbic, E.

2020-01-28 neuroscience 10.1101/2020.01.27.920785 medRxiv

Top 0.1%

12.8%

Show abstract

AO_SCPLOWBSTRACTC_SCPLOWProcessing speech in multi-speaker environments poses substantial challenges to the human perceptual and attention system. Moreover, different contexts may require employing different listening strategies. For instance, in some cases individuals pay attention Selectively to one speaker and attempt to ignore all other task-irrelevant sounds, whereas other contexts may require listeners to Distribute their attention among several speakers. Spatial and spectral acoustic cues both play an important role in assisting listeners to segregate concurrent speakers. However, how these cues interact with varying demands for allocating top-down attention is less clear. In the current study, we test and compare how spatial cues are utilized to benefit performance on these different types of attentional tasks. To this end, participants listened to a concoction of two or four speakers, presented either as emanating from different locations in space or with no spatial separation. In separate trials, participants were required to employ different listening strategies, and detect a target-word spoken either by one pre-defined speaker (Selective Attention) or spoken by any of the speakers (Distributed Attention). Results indicate that the presence of spatial cues improved performance, particularly in the two-speaker condition, which is in line with the important role of spatial cues in stream segregation. However, spatial cues provided similar benefits to performance under Selective and Distributed attention. This pattern suggests that despite the advantage of spatial cues for stream segregation, they were nonetheless insufficient for directing a more focused attentional spotlight towards the location of a designated speaker in the Selective attention condition.

8

Task-Induced Mental Fatigue and Motivation Influence Listening Effort as Measured by the Pupil Dilation in a Speech-in-Noise Task

Alfandari Menase, D.; Richter, M.; Wendt, D.; Fiedler, L.; Naylor, G.

2022-01-05 otolaryngology 10.1101/2022.01.04.22268734 medRxiv

Top 0.1%

12.6%

Show abstract

ObjectivesListening effort and fatigue are common complaints among individuals with hearing impairment (HI); however, the underlying mechanisms, and relationships between listening effort and fatigue are not well understood. Recent quantitative research suggests that the peak pupil dilation (PPD), which is commonly measured concurrent to the performance of a speech-in-noise task as an index of listening effort, may be informative of daily-life fatigue, but it remains unknown whether the same is true for task-induce fatigue. As fatigue effects are known to manifest differently depending on motivation, the main aim of the present study was to experimentally investigate the interactive effects of task-induced fatigue and motivation on the PPD. DesignIn a pre-/post-fatigue within-subject design, 18 participants with normal hearing (NH) engaged in a 98-trial-long speech-in-noise task (the load sequence, approximately 40 min. long), which either excluded or included additional memory demands (light vs. heavy load sequence). Before and after the load sequence, baseline pupil diameter (BPD) and PPD were measured during shorter probe blocks of speech-in-noise tasks. In these probe blocks, if participants correctly repeated more than 60% of the keywords, they could win vouchers of either 20 or 160 Danish krones worth (low incentive vs. high incentive). After each probe block, participants reported their invested effort, tendency for quitting, and perceived performance. ResultsThe BPD in anticipation of listening declined from pre-to post-load sequence, suggesting an overall decrease in arousal, but the decline did not scale with the magnitude of the load sequence, nor with the amount of monetary incentive. Overall, there was larger pre-to post-load sequence decline in PPD when the load sequence was heavy and when the monetary incentives were low. Post-hoc analyses showed that the decline in PPD was only significant in the heavy-load sequence-low reward condition. The speech-in-noise task performance, self-reported effort, and self-reported tendency to quit listening did not change with the experimental conditions. ConclusionsThis is the first study to investigate the influence of task-induced fatigue on BPD and PPD. Whereas BPD was not sensitive to the magnitude of previous load sequence and monetary incentives, the decline in PPD from pre-to post-load sequence was significant after the heavy load sequence when the offered monetary incentives were low. This result supports the understanding that fatigue and motivation interactively influence listening effort.

9

Pupil size and EEG speech tracking as independent measures of listening effort and speech intelligibility

Iotzov, I.; Parra, L.

2023-08-02 neuroscience 10.1101/2023.07.31.551390 medRxiv

Top 0.1%

12.3%

Show abstract

Speech is hard to understand when there is background noise. Speech intelligibility and listening effort both affect our ability to understand speech, but the relative contribution of these subjective factors is hard to disentangle. Previous studies suggest that speech intelligibility could be assessed with EEG speech tracking and listening effort via pupil size. However, these measures may be confounded, because poor intelligibility may require a larger effort. To address this we developed a novel word-detection paradigm that allows for a rapid behavioral assessment of speech processing. In this paradigm words appear on the screen during continuous speech similar to closed captioning. In two listening experiments with a total of 51 participants we manipulated intelligibility with changing auditory noise levels and modulated effort by varying monetary reward. Increasing signal-to-noise ratio (SNR) improved detection performance along with EEG speech tracking, suggesting improved intelligibility. Additionally, we find larger pupil size with increased SNR, suggestive of increased effort. Surprisingly, when we modulated both reward and SNR, we found that reward modulated only pupil size, while SNR modulated only EEG speech tracking. We suggest that this new paradigm can be used to independently and objectively assess speech intelligibility and listening effort.

10

Pupil responses indicate task-relevance and (unsuccessful) inhibition of background sounds during a dual, continuous listening task

Fiedler, L.; Johnsrude, I.; Wendt, D.

2025-05-21 neuroscience 10.1101/2025.05.20.655069 medRxiv

Top 0.1%

12.2%

Show abstract

Auditory attention can be voluntarily directed towards a sound source or automatically captured by background sounds, which may be either relevant, such that the listener shifts their attention to them, or irrelevant such that the listener tries to ignore or inhibit them. The ability to switch focus to a relevant sound source while inhibiting an irrelevant one requires attentional control and is crucial for navigating busy auditory scenes. Objective measures of attentional control could be beneficial in clinical contexts, such as fitting hearing aids. In a dual-task paradigm, we investigated whether pupil responses reflect relevance-dependent attentional selectivity. Participants with self-reported normal hearing (N = 21, Age: 27 to 66 years, pure tone average: -4 to +26 dB HL) listened to continuous speech from the front (primary task) while background sounds, consisting of cue names followed immediately by two-digit numbers, were presented from the left and right. The participant was told that one side, either right or left, was relevant and the other, irrelevant. The secondary task involved memorizing and later recognizing numbers from the relevant side. We observed increased pupil responses to sounds from the relevant side compared to the irrelevant side, indicating selectivity. Exploratory analysis showed that participants who exhibited stronger selectivity recognized more numbers correctly. Interestingly, pupil responses did not differ between hits and misses, but a stronger response to stream confusions versus correct rejections was found, suggesting that participants were more challenged by inhibiting irrelevant sounds than shifting attention to relevant sounds. In sum, our findings demonstrate that pupillometry provides valuable insights into attentional control abilities.

11

A Sensory-Cognitive Dissociation in Listeners with Hearing Difficulties: An Exploratory Analysis Linking Tinnitus to Binaural Unmasking Deficits and Speech Complaints to Memory

Bleeck, S.; Hamza, Y.

2025-12-19 otolaryngology 10.64898/2025.12.18.25342552 medRxiv

Top 0.1%

12.2%

Show abstract

BackgroundThe construct of Hidden Hearing Loss (HHL) proposes a link between patient-reported hearing difficulties and underlying neural deficits not captured by the standard audiogram. However, the heterogeneity of this population challenges the utility of HHL as a unitary diagnosis. This study presents an exploratory analysis aimed at deconstructing the HHL symptom complex. MethodsIn 30 participants with a range of hearing abilities and complaints, we measured binaural unmasking using the Binaural Intelligibility Level Difference (BILD). We employed a two-stage analysis. First, a "lumping" analysis tested whether participants could be grouped into a unitary "HHL profile" that predicted a BILD deficit, using both theory-driven classification and data-driven clustering. Second, after this approach failed, a pre-planned exploratory "splitting" analysis used a Linear Mixed-Effects Model (LMM) to investigate whether individual clinical markers (tinnitus, self-reported speech difficulty) were independently associated with the BILD. ResultsThe "lumping" analyses failed to find a significant difference in the BILD between subgroups, questioning the utility of a unitary HHL profile. In contrast, the exploratory "splitting" analysis found a significant interaction between tinnitus and listening condition ({beta} = 1.57, p = 0.009), suggesting that participants with tinnitus exhibited a smaller BILD. The complaint of speech perception difficulty was not significantly associated with a BILD deficit (p = 0.086) but was associated with lower scores on a test of short-term memory (forward digit span, p = 0.046). ConclusionOur findings challenge the value of a unitary HHL profile for predicting this specific binaural deficit. Instead, our exploratory analysis generated a specific, testable hypothesis of a sensory-cognitive dissociation: in our sample, tinnitus was associated with a reduced capacity for binaural unmasking, while the complaint of speech difficulty was associated with poorer short-term memory. These preliminary findings, derived from post-hoc analysis of an underpowered study, require rigorous validation in larger, pre-registered studies.

12

A standardised test to evaluate audio-visual speech intelligibility in French

Le Rhun, L.; Llorach, G.; Delmas, T.; Suied, C.; Arnal, L.; Lazard, D.

2023-01-18 otolaryngology 10.1101/2023.01.18.23284110 medRxiv

Top 0.1%

10.8%

Show abstract

ObjectiveLipreading, which plays a major role in the communication of the hearing impaired, lacked a French standardised tool. Our aim was to create and validate an audio-visual (AV) version of the French Matrix Sentence Test (FrMST). DesignVideo recordings were created by dubbing the existing audio files. SampleThirty-five young, normal-hearing participants were tested in auditory and visual modalities alone (Ao, Vo) and in AV conditions, in quiet, noise, and open and closed-set response formats. ResultsLipreading ability (Vo) varied from 1% to 77%-word comprehension. The absolute AV benefit was 9.25[L]dB SPL in quiet and 4.6[L]dB SNR in noise. The response format did not influence the results in the AV noise condition, except during the training phase. Lipreading ability and AV benefit were significantly correlated. ConclusionsThe French video material achieved similar AV benefits as those described in the literature for AV MST in other languages. For clinical purposes, we suggest targeting SRT80 to avoid ceiling effects, and performing two training lists in the AV condition in noise, followed by one AV list in noise, one Ao list in noise and one Vo list, in a randomised order, in open or close set-format.

13

Multilevel Modelling of Gaze from Hearing-impaired Listeners following a Realistic Conversation

Shiell, M. M.; Christensen, J. H.; Skoglund, M.; Keidser, G.; Zaar, J.; Rotger-Griful, S.

2022-11-09 neuroscience 10.1101/2022.11.08.515622 medRxiv

Top 0.1%

10.6%

Show abstract

PurposeThere is a need for outcome measures that predict real-world communication abilities in hearing-impaired people. We outline a potential method for this and use it to answer the question of when, and how much, hearing-impaired listeners look towards a new talker in a conversation. MethodTwenty-two older hearing-impaired adults followed a pre-recorded two-person audiovisual conversation in the presence of babble noise. We compared their eye-gaze direction to the conversation in two multilevel logistic regression (MLR) analyses. First, we split the conversation into events classified by the number of active talkers within a turn or a transition, and we tested if these predicted the listeners gaze. Second, we mapped the odds that a listener gazed towards a new talker over time during a conversation transition. ResultsWe found no evidence that our conversation events predicted changes in the listeners gaze, but the listeners gaze towards the new talker during a silent-transition was predicted by time: The odds of looking at the new talker increased in an s-shaped curve from at least 0.4 seconds before to 1 second after the onset of the new talkers speech. A comparison of models with different random effects indicated that more variance was explained by differences between individual conversation events than by differences between individual listeners. ConclusionMLR modelling of eye-gaze during talker transitions is a promising approach to study a listeners perception of realistic conversation. Our experience provides insight to guide future research with this method.

14

Prosocial reward relates to speech reception thresholds and age relates to subjective ratings of speech perception in noise

Oakeson, R. J.; Herbert, J.; Roper, S.; Zhang, H.; Rosen, S.; Scott, S. K.

2024-10-03 neuroscience 10.1101/2024.10.02.616312 medRxiv

Top 0.1%

10.4%

Show abstract

I.Motivation plays an important role in a listeners effort when in noisy environments. However, there remain gaps in the previous literature pertaining to the social psychological factors underlying motivation and listening effort. To fill these gaps, this study explored how prosociality and social reward relate to speech perception in noise (SPiN) in a group of normal hearing English speakers (n = 136; mean age: 29.6, age range: 18-68). We investigated SPiN performance and subjective listening experiences across different speech masking conditions: 1-speaker, 2-speaker, and speech-spectrum shaped noise (SSN), along with a working memory task, and questionnaires pertaining to social orientation. Results indicated a robust effect of different maskers, and that individuals who rated themselves higher in prosocial traits performed better in the masker that yielded the highest threshold out of the three conditions, the 2-speaker condition. Additionally, subjective ratings of listening effort particularly in the 1-speaker condition related to age, where older participants reported greater effort. These findings highlight prosociality and age as important social psychological factors influencing SPiN performance and listening effort, respectively, particularly in complex listening scenarios.

15

Extending the audiogram with loudness growth: revealing complementarity in bimodal aiding

Lambriks, L.; Van Hoof, M.; George, E.; Devocht, E.

2022-10-27 otolaryngology 10.1101/2022.10.24.22281443 medRxiv

Top 0.1%

10.3%

Show abstract

IntroductionClinically, the audiogram is the most commonly used measure when evaluating hearing loss and fitting hearing aids. As an extension, we present the loudness audiogram, which does not only show auditory thresholds but also visualises the full course of loudness perception. MethodsIn a group of 15 bimodal users, loudness growth was measured with the cochlear implant and hearing aid separately using a loudness scaling procedure. Loudness growth curves were constructed, using a novel loudness function, for each modality and then integrated in a graph plotting frequency, stimulus intensity level, and loudness perception. Bimodal benefit, defined as the difference between wearing a cochlear implant and hearing aid together versus wearing only a cochlear implant, was assessed for multiple speech outcomes. ResultsLoudness growth was related to bimodal benefit for speech understanding in noise and to some aspects of speech quality. No correlations between loudness and speech in quiet were found. Patients who had predominantly unequal loudness input from the hearing aid, gained more bimodal benefit for speech understanding in noise compared to those patients whose hearing aid provided mainly equivalent input. DiscussionFitting the cochlear implant and a contralateral hearing aid to create equal loudness at all frequencies may not always be beneficial for speech understanding.

16

Development and clinical application of a consonant confusion task to evaluate hearing aid benefit

Hajicek, J.; Harris, S. E.; Neely, S. T.

2026-04-24 otolaryngology 10.64898/2026.04.23.26351598 medRxiv

Top 0.1%

10.2%

Show abstract

PurposeThis research sought to develop a low-cognitive-load speech-in-noise test based on consonant confusions with the potential for assessing hearing-aid benefit. MethodsVowel-consonant-vowel (VCV) stimuli with added speech-shaped noise were presented as a closed-set consonant identification task. Initially, consonant-confusion matrices were used to select, from a larger set of consonants and vowel contexts, a set of ten consonants and associated signal-to-noise ratios (SNR) that were sensitive to hearing loss. The sensitivity of the qVCV test to hearing loss was validated by comparing predicted pure-tone average (PTA) hearing thresholds with their audiometric PTA. Clinical viability of the qVCV test was assessed by comparisons to the QuickSIN test. Hearing-aid benefit was assessed by comparing test scores in unaided and aided conditions. ResultsThe consonants most sensitive to hearing loss were /b d g t k v z s [esh] n/ in the vowel context /[a]/. A cross-validated prediction of PTA had a mean-absolute error of 5.7 dB. The repeatability of qVCV at 50 trials was equivalent to the QuickSIN average of two lists. Hearing-aid benefit was quantified as a decibel reduction in hearing loss. ConclusionsqVCV and QuickSIN performed similarly when test times are equated. The advantages of qVCV include lower cognitive demand, fewer learning effects, and automated scoring. PTA predicted by qVCV which greatly exceeds audiometric PTA may indicate either cognitive deficits or cochlear neural degeneration. The qVCV quantification of hearing-aid benefit may have clinical value.

17

Face masks impair reconstruction of acoustic speech features and higher-level segmentational features in the presence of a distractor speaker

Haider, C. L.; Suess, N.; Hauswald, A.; Park, H.; Weisz, N.

2021-09-30 neuroscience 10.1101/2021.09.28.461909 medRxiv

Top 0.1%

9.1%

Show abstract

Multisensory integration enables stimulus representation even when the sensory input in a single modality is weak. In the context of speech, when confronted with a degraded acoustic signal, congruent visual inputs promote comprehension. When this input is occluded speech comprehension consequently becomes more difficult. But it still remains inconclusive which levels of speech processing are affected under which circumstances by occlusion of the mouth area. To answer this question, we conducted an audiovisual (AV) multi-speaker experiment using naturalistic speech. In half of the trials, the target speaker wore a (surgical) face mask, while we measured the brain activity of normal hearing participants via magnetoencephalography (MEG). We additionally added a distractor speaker in half of the trials in order to create an ecologic difficult listening situation. A decoding model on the clear AV speech was trained and used to reconstruct crucial speech features in each condition. We found significant main effects of face masks on the reconstruction of acoustic features, such as the speech envelope and spectral speech features (i.e. pitch and formant frequencies), while reconstruction of higher level features of speech segmentation (phoneme and word onsets) were especially impaired through masks in difficult listening situations. As we used surgical face masks in our study, which only show mild effects on speech acoustics, we interpret our findings as the result of the occluded lip movements. This idea is in line with recent research showing that visual cortical regions track spectral modulations. Our findings extend previous behavioural results, by demonstrating the complex contextual effects of occluding relevant visual information on speech processing. HighlightsO_LISurgical face masks impair neural tracking of speech features C_LIO_LITracking of acoustic features is generally impaired, while higher level segmentational features show their effects especially in challenging listening situations C_LIO_LIAn explanation is the prevention of a visuo-phonological transformation contributing to audiovisual multisensory integration C_LI

18

Low noise HRTFs and delay line corrections are detrimental to the prediction of ITD discrimination thresholds from environmental statistics

Camperos, M. J. G.; Goncalves, T. C.; Marin, B.; Pavao, R.

2022-09-10 neuroscience 10.1101/2022.09.09.507313 medRxiv

Top 0.1%

8.9%

Show abstract

Interaural Time Difference (ITD) is the main cue for azimuthal auditory perception in humans. ITDs at each frequency contribute differently to azimuth discrimination, which can be quantified by their azimuthal Fisher Information. Consistently, human ITD discrimination thresholds are predicted by the azimuthal information. However, this prediction is poor for frequencies below 500 Hz. Such poor prediction could be ascribed to the strategy of quantifying azimuthal information using HRTFs obtained in unnaturalistic anechoic chambers or by using a direct method which does not incorporate the delay lines proposed by the Jeffress-Colburn model. In the present study, we obtained ITD discrimination thresholds from extensive sampling across frequency and ITD, and applied multiple strategies for quantifying azimuthal information. These strategies employed HRTFs obtained in realistic and anechoic chambers, with and without considering delay lines. We found that ITD discriminability thresholds across the complete range of frequencies are better predicted by azimuthal information conveyed by ITD cues when (1) we use naturalistic high-noise HRTFs, and (2) ITD delay compensation is not applied. Our results support that auditory perception is shaped by natural environments, which include high reverberation in low frequencies. Moreover, we also suggest that delay lines are not a crucial feature for determining ITD discrimination thresholds in the human auditory system. GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=186 SRC="FIGDIR/small/507313v1_ufig1.gif" ALT="Figure 1"> View larger version (28K): org.highwire.dtl.DTLVardef@1540172org.highwire.dtl.DTLVardef@2aea34org.highwire.dtl.DTLVardef@175ff15org.highwire.dtl.DTLVardef@1bdf2d1_HPS_FORMAT_FIGEXP M_FIG C_FIG

19

Relevance of Auditory Errors Decreases When Errors Are Introduced Suddenly

Chao, S.-C.; Daliri, A.

2021-08-10 neuroscience 10.1101/2021.08.09.455646 medRxiv

Top 0.1%

8.7%

Show abstract

PurposeWhen the speech motor system encounters errors, it generates adaptive responses to compensate for the errors. We previously showed that adaptive responses to task-irrelevant errors are significantly smaller than responses to task-relevant errors when errors are introduced gradually. The current study aimed to examine responses to task-irrelevant and task-relevant errors when errors are introduced suddenly. MethodWe used an adaptation paradigm in which participants experienced task-relevant errors (induced by formant-shift perturbations) and task-irrelevant errors (induced by formant-clamp perturbations). For one group of participants (N = 30), we applied the perturbations gradually. The second group of participants (N = 30) received the perturbations suddenly. We designed the perturbations based on participant-specific vowel configurations such that a participants first and second formants of /{varepsilon}/ were perturbed toward their /ae/. To estimate adaptive responses, we measured formant changes (within 0-100 ms of the vowel onset) in response to the formant perturbations. ResultsWe found that (1) the difference between adaptive responses to formant-shift and formant-clamp perturbations was the smallest for the suddenly introduced perturbations, and (2) responses to formant-shift perturbations positively correlated with responses to formant-clamp perturbations for the suddenly (but not gradually) introduced perturbations. ConclusionsThese results showed that the speech motor system responds to task-relevant errors and task-irrelevant errors more differently when errors are introduced gradually than suddenly. Overall, the speech motor system evaluates the relevance of errors and uses its evaluation to modulate its adaptive responses to errors.

20

Optimal parameters for measuring multiband auditory brainstem responses to continuous speech

Polonenko, M. J.; Eisenreich, B. R.

2025-12-26 neuroscience 10.64898/2025.12.24.696406 medRxiv

Top 0.1%

8.5%

Show abstract

Accurate clinical hearing assessment depends on efficient, engaging measures designed to evaluate ecologically relevant stimuli. Often brief tones or narrowband noise stimuli are used, providing a useful but limited snapshot of hearing function. Including dynamic speech offers a means to capture how the hearing system encodes complex sounds critical for everyday communication. Here we describe the optimal parameters for using audiobook continuous speech with the multiband peaky speech paradigm to measure frequency-specific auditory brainstem responses (ABRs) to standard audiological octave bands from 500-8000 Hz in each ear simultaneously. Using computational modeling and direct human ABR testing in adults with normal hearing, we demonstrate that continuous speech signals with a chirp phase profile and fundamental frequency (f0) lowered to the range of 90-110 Hz evoke the largest ABR wave V amplitudes. This amplitude boost occurs when any narrators f0 is lowered to this optimal range, but the largest responses occur for narrators with original f0s below 170 Hz. We also confirmed that different narrator speech stimuli with these optimized parameters can evoke similarly sized ABRs, but some minor differences remain for testing time. Ultimately, optimizing phase-f0 parameters substantially sped up the median testing time to obtain robust audiobook-based multiband ABRs to within 14 minutes, thereby making this paradigm more feasible for future research and clinical translation.